Investigate the Performance of Document Clustering Approach Based on Association Rules Mining

نویسندگان

  • Noha Negm
  • Mohamed Amin
  • Passent Elkafrawy
  • Abdel Badeeh M. Salem
چکیده

The challenges of the standard clustering methods and the weaknesses of Apriori algorithm in frequent termset clustering formulate the goal of our research. Based on Association Rules mining, an efficient approach for Web Document Clustering (ARWDC) has been devised. An efficient Multi-Tire Hashing Frequent Termsets algorithm (MTHFT) has been used to improve the efficiency of mining association rules by targeting improvement in mining of frequent termset. Then, the documents are initially partitioned based on association rules. Since a document usually contains more than one frequent termset, the same document may appear in multiple initial partitions, i.e., initial partitions are overlapping. After making partitions disjoint, the documents are grouped within the partition using descriptive keywords, the resultant clusters are obtained effectively. In this paper, we have presented an extensive analysis of the ARWDC approach for different sizes of Reuter’s datasets. Furthermore the performance of our approach is evaluated with the help of evaluation measures such as, Precision, Recall and F-measure compared to the existing clustering algorithms like Bisecting K-means and FIHC. The experimental results show that the efficiency, scalability and accuracy of the ARWDC approach has been improved significantly for Reuters datasets. Keywords—Web Document Clustering; Knowledge Discovery; Association Rules Mining; Frequent termsets; Apriori algorithm; Text Documents; Text Mining; Data Mining

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Retaining Customers Using Clustering and Association Rules in Insurance Industry: A Case Study

This study clusters customers and finds the characteristics of different groups in a life insurance company in order to find a way for prediction of customer behavior based on payment. The approach is to use clustering and association rules based on CRISP-DM methodology in data mining. The researcher could classify customers of each policy in three different clusters, using association rules. A...

متن کامل

A new approach based on data envelopment analysis with double frontiers for ranking the discovered rules from data mining

Data envelopment analysis (DEA) is a relatively new data oriented approach to evaluate performance of a set of peer entities called decision-making units (DMUs) that convert multiple inputs into multiple outputs. Within a relative limited period, DEA has been converted into a strong quantitative and analytical tool to measure and evaluate performance. In an article written by Toloo et al. (2009...

متن کامل

Applying a decision support system for accident analysis by using data mining approach: A case study on one of the Iranian manufactures

Uncertain and stochastic states have been always taken into consideration in the fields of risk management and accident, like other fields of industrial engineering, and have made decision making difficult and complicated for managers in corrective action selection and control measure approach. In this research, huge data sets of the accidents of a manufacturing and industrial unit have been st...

متن کامل

Developing a Course Recommender by Combining Clustering and Fuzzy Association Rules

Each semester, students go through the process of selecting appropriate courses. It is difficult to find information about each course and ultimately make decisions. The objective of this paper is to design a course recommender model which takes student characteristics into account to recommend appropriate courses. The model uses clustering to identify students with similar interests and skills...

متن کامل

An Efficient Hash-based Association Rule Mining Approach for Document Clustering

Document clustering is one of the important research issues in the field of text mining, where the documents are grouped without predefined categories or labels. High dimensionality is a major challenge in document clustering. Some of the recent algorithms address this problem by using frequent term sets for clustering. This paper proposes a new methodology for document clustering based on Asso...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013